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TITLE: Multidimensional database management system 

for 

computer, distributes data blocks in storage 

area of 

database, so that number of cells in each 

storage area is 

approximately equal 
•PATENT-ASSIGNEE: HITACHI LTD [HITA] 
PRIORITY-DATA: 1999JP-0194984 (July 8, 1999) 
PATENT -FAMILY: 

PUB - NO PUB - DATE LANGUAGE 

PAGES MAIN- IPC 

JP 2001022621 A January 26, 2001 N/A 

011 G06F 012/00 

APPLICATION-DATA: 

PUB-NO APPL-DESCRIPTOR APPL-NO 

APPL-DATE 

JP2001022621A N/A 1999JP-0194984 

July 8, 1999 

INT-CL (IPC) : G06F012/00, G06F017/30 



ABSTRACTED -PUB -NO: JP2 0 01022 621A 
BASIC- ABSTRACT: 

NOVELTY - Process node (202) processes data cells of different 
dimensions . 

Based on the number of storing areas of the database, input data is 
divided 

into data blocks and coordinate value is assigned to each block 
containing the 

cells. The data blocks are distributed in the storing areas of 
database, so 

that the number of data cells in each area is approximately equal. . 
USE For multidimensional database management for computer. 
ADVANTAGE - Parallel processing and calculation velocity are improved 
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by 

enabling suitable storage control of the data cells in the database. 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
database 

management system. 

Process node 202 

CHOSEN -DRAWING: Dwg.2/12 

TITLE-TERMS: MULTIDIMENSIONAL DATABASE MANAGEMENT SYSTEM COMPUTER 
DISTRIBUTE • 

DATA BLOCK STORAGE AREA DATABASE SO NUMBER CELL STORAGE 

AREA 

APPROXIMATE EQUAL 
DERWENT-CLASS : T01 
EPI-CODES: T01-H01; T01-J05B2; 



SECONDARY-ACC-NO : 

Non-CPI Secondary Accession Numbers: N2 00 1-148 02 8 
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PRIORITY-DATA: 1999US-0368241 (August 4, 1999) 
PATENT -FAMILY: 

PUB -NO PUB -DATE LANGUAGE 

PAGES MAIN- IPC 
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US 6408292 Bl June 18, 2002 N/A 

000 G06F 017/30 
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LS LT LU' LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK 
SL TJ TM 
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APPLICATION-DATA : 

PUB-NO APPL-DESCRIPTOR APPL-NO 

APPL-DATE 
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US 6408292B1 
August 4, 1999 



N/A 



1999US-0368241 



INT-CL (IPC)-: G06F017/30 



RELATED-ACC-NO: 2001-625902, 2002-414155 , 2003-039626 ,2003-039629 
, 2003-058087 , 2003-220403 , 2003-902144 , 2004-060691 , 2004-061065 
, 2005-231954 , 2005-241102 , 2005-252860. , 2005-353444 , 2005-512094 

ABSTRACTED-PUB-NO: US 6408292B 

BASIC -ABSTRACT: 

NOVELTY - Each data location in multidimensional database (MDB) is 
specified by- 
integer encoded business dimensions associated with data. Address 
data mapping 

unit maps integer coded MDB dimensions against integer encoded data 
storage 

address within memory associated with MDB using modular arithmetic 
function. 

Data accessing unit accesses data element in memory using map 
information. 

DETAILED DESCRIPTION - A parallel computing platform has processors 
and 

memories for storing data elements in integer encoded address. 
INDEPENDENT 

CLAIMS are also included for the following: 

(a) Data element accessing method; 

(b) Data element management system; . 

(c) Data element management method; 

(d) Internet URL directory system; 

(e) Internet enabled system 

USE - For accessing multidimensional database (MDB) such as data 
warehouse in 

business organization for on-line analytical processing, MDB is used 
in on-line 

e- commerce shopping system for storing consumer shopping profile 
information, 

for URL directory system used for data mixing in Internet, and other 
MDB based 

system used for predictive business modeling for applications such as 
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database 

marketing, f inanciaL/risk analysis, fraud management, bioinformatics, 
return-on- investment justification, business intelligence 
application, customer 

relation management, enterprise information portals and systems used 
for 

supporting real-time control of packet routers, switches and other 
devices used 

in Internet, for real-time control of automated parcel routing and 

sortation 

system. 

ADVANTAGE - Improved data accessing is provided by parallel computing 
platform. 

Inter process communication among parallel processors is minimized. 
Fast, 

affordable and easy. access is provided to customer enabling companies 
to more 

effectively market products and service over internet. Supporting 
real-time 

control of processor in response to complex states of information 
reflected in 
MDB . 

DESCRIPTION OF DRAWING (S) - The figure shows the schematic 
representation of 

data element address assignment method. 
ABSTRACTED -PUB -NO: WO 2 001114 97A 
EQUIVALENT -ABSTRACTS : 

NOVELTY - Each data location in multidimensional database (MDB) is 
specified by 

integer encoded business dimensions associated with data. Address 
data mapping 

unit maps integer coded MDB dimensions against integer encoded data 
storage 

address within memory associated with MDB using modular arithmetic 
function,. 

Data accessing unit accesses data element in memory using map 
information. 

DETAILED DESCRIPTION - A parallel computing platform has processors 
and 

memories for storing data elements in integer encoded address. 
INDEPENDENT 

CLAIMS are also included for the following: 
(a) Data element accessing method; 
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(b) Data element management system; 

(c) Data element management method; 

(d) Internet URL directory system; 

(e) Internet enabled system 

USE - For accessing multidimensional database (MDB) such as data 
warehouse in 

business organization for on-line analytical processing, MDB is used 
in on-line 

e- commerce shopping system for storing consumer shopping profile 
information, 

for URL directory system used for data mixing in Internet, and other 
MDB based 

system used for predictive business modeling for applications such as 
database 

marketing, financial/risk analysis, fraud management, bioinf ormatics , 
return- on -investment j ustif ication, business intelligence 
application, customer 

relation management, enterprise information portals and systems used 
for 

supporting real-time control of packet routers, switches and other 
devices used 

in Internet, for real-time control of automated parcel routing and 
sortation . . . 

system. 

ADVANTAGE - Improved data accessing is provided by parallel computing 
platform. 

Inter process- communication among parallel processors is minimized. 
Fast, 

affordable and easy access is provided to customer enabling companies 
to more 

effectively market products and service over internet. Supporting 
real-time 

control of processor in response to complex states of information 
reflected in 

MDB . 4 - 

DESCRIPTION OF DRAWING (S) - The figure shows the schematic 
representation of 

data element address assignment method. 
CHOSEN-DRAWING: Dwg.8A/l8 

TITLE-TERMS: MULTIDIMENSIONAL DATABASE ELEMENT ACCESS SYSTEM LINE 
ANALYSE 
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ASSOCIATE 
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DERWENT- CLASS: T01 W01 

EPI-CODES: T01-D02; T01-H01A; T01-H07C5E; T01-J05A; T01-J05B4M; T01- 
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TITLE : Method of indexing data in mul t idimens ional 

database for 

on-line analytical processing, involves 

reorganizing 

selected cells in multidimensional data storage 

model 

based on data access information generated 

based on user 

queries 

INVENTOR: JORDAN, P M; NG, K S ; SANDERS, M J ; STEWART, J 
PATENT -ASSIGNEE: DESCISYS LTD [DESCN] 
PRIORITY-DATA: 2003NZ-0527535 (August 11,* 2003) 
PATENT -FAMILY: 

PUB -NO PUB -DATE LANGUAGE 
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037 G06F 017/30 
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PUB-NO APPL-DESCRIPTOR APPL-NO 
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August 11, 2003 

NZ 527535A Div in NZ 537745 

N/A 

INT-CL (IPC) : G06F017/30, G06F017/40 , ' G06F017/60 

ABSTRACTED -PUB -NO: NZ 527535A 
BASIC-ABSTRACT: 

NOVELTY - A multidimensional logical access model and a 
multidimensional data 

storage model are created. The data access information are generated 
based on 

user queries of the database. The selected cells in the 
multidimensional data 

storage model are reorganized based on the data access information to 
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reduce 

the cost of access to selected cells in response to user query of the 
database. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following: 

(1) computerized apparatus for indexing data in multidimensional 
database; and 

(2) computer program for indexing data in multidimensional database. 

USE - For indexing data in multidimensional database used in 
multidimensional 

on-line analytical processing (MOLAP) and relational OLAP (ROLAP) . 

ADVANTAGE - Optimizes data access by separating logical access model 
from data 

storage. Allows the database management software to automatically 
reorganize 

data in response to queries, and hence improves database performance. 

DESCRIPTION OF DRAWING (S) - The figure shows a data structure of a 
multidimensional database model. 

CHOSEN-DRAWING: Dwg.l/9 

TITLE-TERMS: METHOD INDEX DATA MULTIDIMENSIONAL DATABASE LINE ANALYSE 
PROCESS 

SELECT- CELL MULTIDIMENSIONAL DATA STORAGE MODEL BASED 

DATA ACCESS 

INFORMATION GENERATE BASED USER QUERY 
DERWENT- CLASS: T01 

EPI-CODES: T01-J05B1; T01-J05B4M; 
SECONDARY -ACC -NO: 

Non-CPI Secondary Accession Numbers: N2005-551701 
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complex grouping/aggregation queries on enormous quantities of data. Commercial relational 
database management systems use mainly multiple one-dimensional indexes to process OLAP 
queries that restrict multiple dimensions. However, in many^cases, multidimensional access methods 
outperform one-dimensional indexing methods. We present an architecture for multidimensional 
databases that are clustered with respect to multiple hierarchical dimensions. It is based on the star 
schema and is called CSB star. Then, we focus on heuristically optimizing OLAP queries over this 
schema using multidimensional access methods. Users can still formulate their queries over a 
traditional star scheme, which are then rewritten by the query processor over the CSB star. We 
exploit the different clustering features of the CSB star to efficiently process a class of typical OLAP 
queries. We detect special cases where the construction of an evaluation plan can be simplified and 
we discuss improvements of our technique. 
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ABSTRACT 

On-Line Analytical Processing (OLAP) is a technology 
that encompasses applications requiring a multidimen- 
sional and hierarchical view of data. OLAP applica- 
tions often require fast response time to complex group- 
ing/aggregation queries on enormous quantities of data. 
Commercial relational database management systems 
use mainly multiple one-dimensional indexes to process 
OLAP' queries that restrict multiple dimensions. How- 
ever, in many cases, multidimensional access methods 
outperform one-dimensional indexing methods. 

We present an architecture for multidimensional data- 
bases that are clustered with respect to multiple hi- 
erarchical dimensions. It is based on the star schema 
and is called CSB star. Then, we focus on heuristi- 
cally optimizing OLAP queries over this schema using 
multidimensional access methods. Users can still formu- 
late their queries over a traditional star schema, which 
are then rewritten by the query processor over the CSB 
star. We exploit the different clustering features of the 
CSB star to efficiently process a class of typical OLAP 
queries. We detect special cases where the construction 
of an evaluation plan can be simplified and we discuss 
improvements of our technique. 

1. INTRODUCTION 

Decision support applications increasingly rely on On- 
Line Analytical Processing (OLAP) to analyze business 
related information. OLAP is a technology that enCOIIl- 
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passes applications requiring a multidimensional view of 
data. In such a view of data there is a set of measures 
that are the metrics of interest. The measures contain 
numeric data. Each of them is uniquely determined 
by a set of different and often independent dimensions. 
Dimensions have associated with them hierarchies that 
specify different aggregation levels of data and hence 
different granularities of viewing data. Many relational 
OLAP systems use the star schema [12] to represent 
the multidimensional data model. A multidimensional 
database organized as a star consists of a fact table and 
a table for each dimension. A dimension table comprises 
attributes for each (aggregation) level of the dimension 
and other (descriptive) attributes that characterize the 
different levels of the dimension. The fact table stores 
attributes for each numeric measure and foreign key at- 
tributes to the attribute of the finest granularity level 
of each dimension. 

OLAP applications often require fast response time 
to complex grouping/ aggregation queries on enormous 
quantities of data. Common techniques to improve query 
performance are materializing views, and making exten- 
sive use of clustering and indexing methods. In multidi- 
mensional databases these techniques have to be adapted 
in order to account for the multiple dimensions. Mate- 
rializing views is an efficient technique when the way to 
compute a query using the materialized views is known. 
However, the general problem of answering and optimiz- 
ing grouping/ aggregation queries using multiple mate- 
rialized views [4, 3, 20] is complex. Another difficulty of 
the view materializing technique is the optimal selection 
of views to materialize [11, 19]. View selection becomes 
even more complex when the query pattern is not known 
in advance [13]. Lastly, this technique incurs important 
additional space requirements and intricate algorithms 
for incrementally maintaining the materialized views [9]. 

Commercial relational database management systems 
use mainly multiple one-dimensional indexes, like com- 
pound indexes and bitmap indexes, to process OLAP 
queries that restrict multiple dimensions. The search 
key in compound indexes is a concatenation of multi- 
ple attributes (where the order of attributes matter). 
Therefore, they are useful for processing only some of 
the queries that restrict these attributes. Selecting views 
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and compound indexes for materialization for a given 
query set pattern is a difficult task [10] and depends 
on the specific query set pattern. Bitmap indexes (and 
their variants) [16] are very popular because of their 
compactness and support of star joins. Nevertheless, 
in many cases, multidimensional access methods (e.g., 
R-tree) outperform bit-mapped indexing methods [18]. 

Contribution In this paper we focus on heuristically 
optimizing OLAP queries in databases that are clus- 
tered with respect to multiple hierarchical dimensions 
using multidimensional access methods. The main con- 
tributions are the following: 

• We present a multidimensional database architecture 
based on the star model (called CSB star). The di- 
mension tables are organized using one-dimensional 
hierarchical clustering and encoding techniques, while 
the fact table is organized using a multidimensional 
access method. 

• We show how OLAP queries can be easily expressed 
by the users over a traditional star schema. The CSB 
star schema is intended to be a storage option only. 
The query processor rewrites user queries over the 
CSB star schema. 

• We exploit the clustering features of the CSB star 
schema to efficiently process a class of typical OLAP 
queries. The expensive* star-join operations needed 
in a traditional star schema can be essentially imple- 
mented as multidimensional range restrictions on the 
fact table and range restrictions on the dimension ta- 
bles. Supplementary joins are implemented as merge 
join operations on sorted tables. Grouping operations 
are performed on partially sorted relations. 

• In this context we detect special cases where joins of 
fact table tuples with tuples from the dimensions can 
be avoided, and the grouping of the tuples can be 
performed only once before all join operations. 

This work is done in the context of the European 1ST 
project " EDITH". In this project we use a multidimen- 
sional access method integrated into the kernel of a 
database management system [ 17]. 

Outline The next section reviews related work. Section 
3 introduces the basic concepts of multidimensional hi- 
erarchical clustering adopted here. In Section 4, the 
architecture of the multidimensional database is pre- 
sented. Section 5 describes the class of queries consid- 
ered, introduces a number of physical operators, and 
shows how queries in this class can be heuristically op- 
timized by exploiting the clustering scheme of the mul- 
tidimensional database architecture. Section 7 contains 
concluding remarks and directions for further work. 

2. RELATED WORK 

Conventional query optimizers exploit the knowledge 
about the group-by clause in a query only by including 
the grouping columns in the list of interesting orders 
during join enumeration. The group-by operation can 
be pushed past one or more joins. This early grouping 
may reduce the query processing cost by reducing the 
amount of data participating in the joins. Necessary and 
sufficient conditions for deciding when this transforma- 
tion is valid are provided in [22]. A generalization of the 



early grouping transformation, the coalescing grouping 
transformations allow us (a) to perform early group-by 
but require additional group-by subsequently that co- 
alesces multiple groups and (b) to deal with the case 
where not all the aggregating columns are present in 
the node of the query evaluation plan where an early 
group-by operator is placed [2], Different other cases of 
early grouping and aggregation are studied and catego- 
rized in [23], along with their reverse transformations of 
lazy grouping and aggregation. These latter transfor- 
mations postpone the application of a grouping opera- 
tion until after a join, and may reduce the number of 
input rows to the group-by, if the join is selective. Both 
directions of transformation are considered during query 
optimization. Transformations &S well as optimization 
algorithms for queries with aggregate views and queries 
containing aggregate nested subqueries are presented in 
[3]. The proposed pull-up transformation (the equiva- 
lent of lazy grouping and aggregation) makes it possi- 
ble to reorder relations that belong to different query 
blocks so that these relations can be joined before the 
group-by operators are applied. The generalized projec- 
tion operator, an extension of the duplicate eliminating 
projection operator, captures the semantics of group-by, 
aggregation, duplicate-eliminating projection and dupli- 
cate preserving-projection in a common unifying frame- 
work [8]. In this framework query rewriting rules are 
able to push aggregation operators past selection condi- 
tions (and vice-versa). The pull-up transformation does 
not apply in the evaluation plans that we consider here 
because the join operations do not reduce the number 
of tuples of the joining tables. In contrast, a coalesc- 
ing grouping transformation can be very efficiently ex- 
ploited: an early grouping can be pushed past all joins. 
It is worth noting that this is due to the architecture of 
the CSB star schema that uses hierarchical clustering 
and encoding techniques, and does not apply to a tra- 
ditional star join schema. Related works dealing with 
multidimensional access methods and multidimensional 
hierarchical clustering arc cited in the next section. 

3. MULTIDIMENSIONAL HIERARCHI- 
CAL CLUSTERING 

We present in this section the basic concepts of multi- 
dimensional hierarchical clustering and range query pro- 
cessing adopted in the paper. 

Multidimensional clustering and the UB-tree. A 

tuple of a relation in a relational database can be viewed 
as a point in a multidimensional Space where the dimen- 
sions are determined by the attributes of the tuple. In 
this context, the processing of queries can be supported 
by multidimensional access methods [6]. OLAP queries 
often impose restrictions on multiple attributes (dimen- 
sions). Multidimensional access methods are used to 
cluster data with respect to multiple dimensions. Multi- 
dimensional clustering can substantially speed up queries 
that restrict multiple dimensions. The main problem in 
the design of multidimensional access methods is that 
there exists no total ordering among the points in the 
multidimensional space that preserves spatial proxim- 
ity. One way to heuristically deal with this problem is 
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to discover a total order that preserves spatial proximity 
to some extent. This total order is called Space-filling 
curve. Then, a one-dimensional access method can be 
used in combination with the space-filling curve to im- 
prove the access of the points in the space. Such a 
solution to the problem is provided by the UB-tree [1]. 

Range queries on UB-trees and the Tetris algo- 
rithm. A query that restricts all attributes (dimen- 
sions) to an interval is called range query. The multidi- 
mensional interval determined by the one-dimensional 
intervals is called query box. In order to answer a range 
query, [I] presents an algorithm for UB-trees that fetches 
from the disk only those regions (pages) that properly 
intersect the query box. The Tetris algorithm [15] is a 
generalization of the multidimensional range query algo- 
rithm that efficiently combines sort operations with the 
evaluation of multi-attribute restrictions. The Tetris al- 
gorithm takes as input an attribute of a relation and a 
query box determined by the restrictions of a query on 
the relation, and returns the tuples of the relation sat- 
isfying the restrictions, ordered on the input attribute. 
Compared to the access methods of commercial systems 
for queries in the TPC-D benchmark [21], the Tetris al- 
gorithm shows significant speedups, important tempo- 
rary storage requirement reduction for the sorting pro- 
cess, and multiple times faster production of the first 
results of a sort operation [15]. 

Dimension hierarchies. A dimension hierarchy D 
of depth A: is a list L k , , L 1 of k names which are 
called dimension or hierarchy levels. With every level 
L in D, a non-empty finite set of values dom(L) is as- 
sociated' through the function dom such that dom(L x ) f| 
domlL 3 ) = 0, i ^ j. In each dimension, we additionally 
assume an auxiliary level whose domain contains 

the single value all. For every two levels L x , L 1 * 1 , t G 
[1, k], a function parent from dom(L x ) onto dom(L t+1 ) 
is defined. For every two levels L* +1 , L', i € [1, k] % 
a function children from dom(U^ 1 ) to the power set 
of dom(L t ) is defined: children(v) is the set of values 
v' in dom(L x ) such that parent{v') = v. Clearly, if 
v } V* € dom{L x ) y i > 1, and V ^ w\ children(w) j= 0, and 
children(v)C\children(v t ) - 0. A value in dom(L x ), t > 
1, represents a set of values in domiL 1 ) through the 
function children. Level L ! (the lowest level in the hi- 
erarchy of dimension D) corresponds to the smallest 
(finest) granularity of viewing data. Level L* +1 (the 
top level in the hierarchy of dimension D) corresponds 
to the largest granularity of viewing data. A dimension 
hierarchy defines a hierarchy tree: the nodes of the tree 
arc the values in U?=i dom(L % ), while the edges are de- 
termined by the parent function. The leaf nodes of the 
tree are the values in dom{L l ). The root node is the 
unique value of dom(L h+l ). The path of a node (value) 
U l G dom(L x ), t € [1,*], is defined to be the concate- 
nation of the nodes n fc , n h ~ x , , n % on the path in a 
hierarchy tree from the root node to Tl\ 

Hierarchy clustering and encoding. OLAP queries 
impose restrictions on different levels of the dimension 
hierarchies. Hierarchy clustering and encoding [24, 14] 
in combination with multidimensional access methods 
can be used to speed up these queries and to optimize 



the storage usage. 

In order to take into account dimension hierarchies 
in the clustering of data, instead of the values V of the 
lowest level of a dimension, their path p in the hierarchy 
tree of the dimension is considered. The concatenated 
values can be shortened using the following encoding 
schema which is quite similar to that of [14]. Let V be 
a value in the domain of level L*, t G [1, k]. Let also 
v' be the parent of V in the hierarchy tree, V be the 
cardinality of children^ ) , and <q be the "less than" 
comparison operator of the query language. We define 
a one-to-one function S : children^') -4 [0, V — 1] such 
that: for every u,u* G children(v') } u <q u' implies 
S(u) < S(u'). S(w) is called the surrogate of V, Note 
that if <q is not defined in the query language for the 
values in dom(L x ) y S is simply defined as a one-to-one 
function from children(v') onto [0, V — 1]. 

Let V be a value in U*=i dom(L % ). Then, the com- 
pound surrogate of H, C(w), is the path of v where the 
concatenated values are replaced by their surrogates. 
One way to compactly store compound surrogates is as 
fixed length strings of bits: for each level L l , t G [1, ^]» 
in the hierarchy, the maximum surrogate x is defined 
as max{V V is the cardinality of children(w) and 
V € dom(L x ~ 1 )}. Then, at least pog 2 xl bits are re- 
served in the binary representation of the compound 
surrogate for the surrogates of level L*. The total num- 
ber of bits reserved for a level & is called spread of L x . 

4. THE CSB STAR SCHEMA 

We present in this section the architecture of our mul- 
tidimensional database. The schema of the multidimen- 
sional database is a star with one fact table F and di- 
mension tables D\ , , D,. 

The dimension tables. The schema of a dimension 
table Di corresponding to a dimension hierarchy Di of 
depth ki consists of: 

(a) A set Hi of hierarchy attributes {if/, #?, . . H^} 
that correspond one-to-one to the ki levels of the 
dimension hierarchy; Hi corresponds to the lowest 
level in the hierarchy and to the highest. 

(b) A set Fi of feature attributes that provide descrip- 
tive characterizations of the different levels of the 
dimension. A feature attribute Ff characterizes the 
hierarchy attribute Hf. Feature attributes are op- 
tional in the schema of a dimension table. 

(c) A compound surrogate attribute C;. 

If t is a tuple in table Di, t[H{] G dom(H{), j G [1, ki], 
and parent(t[HD) = tlH^ 1 ], j G [1, Jfc; - 1]. t[d} is the 
compound surrogate of t[Hi]. 

By the definition of the compound surrogate of a 
value, it is clear that a point restriction on a hierar- 
chy attribute of a dimension table can be expressed as 
a single range restriction on the compound attribute of 
this dimension table. 

The compound surrogate attribute Ci is the primary 
key of the dimension table £)». By the definition of a 
dimension hierarchy, H* is also a key of table Di, and 
the following functional dependencies between hierar- 
chy attributes hold on D;: Hj -f j G - 
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1]. Furthermore, the following functional dependencies 
between hierarchy and features attributes hold on D;: 
H} Fj, i € where F/ is a feature attribute 

characterizing the hierarchy attribute Hj. 

We associate with a dimension table a primary in- 
dex Pi on Ci, and a secondary (compound) index I; on 
H± % , , Hi , Ci . Index Ii is important for computing 
ranges of values on Ci from point or range restrictions 
on the different hierarchy attributes of Di. 

Since the tuples of Di are clustered on Ci, range re- 
strictions on Ci can be computed efficiently. This clus- 
tering provides also a grouping of the tuples of the di- 
mension table with respect to any hierarchy attribute. 
This property is particularly useful for evaluating group- 
ing/aggregation queries as those that are extensively 
used in OLAP applications. 

The fact table. The schema of the fact table F con- 
sists of: 

(a) The compound surrogate attributes Ci, , C n , one 
for each dimension table Dj, i € [1, n]. 

(b) A set of measure attributes M = {Mi,. , Mjt}. 
The set of attributes C\ , . , C n is the primary key of 
F. Each Ci in F is a foreign key and refers to attribute 
Ci of the dimension table Di. In general, a fact table 
contains a huge number of tuples. In a traditional star 
schema, the lower level hierarchy attributes are stored 
as foreign keys in the fact table. Instead, by storing 
the compound surrogate attributes as foreign keys in 
the fact table, we importantly reduce its size. Table F 
is organized as a UB-tree on the attributes Ci,. , C 
The schema of our multidimensional database is called 
Compound Surrogate Based star schema (CSB star for 
short). 

User View. The clustering scheme of the CSB star 
schema is intended to be a storage option only, without 
affecting the formulation of queries by the user. User 
queries are easily formulated on a simple star schema 
(called user star schema) as this is defined by the view 
FT for the fact table, and the views DT\, . , DTn for 
the 71 dimension tables: 

CREATE VIEW FT AS 

SELECT' Hl,... } H^M 

FROM F. D u , D n 

WHERE F.Ci = Dx.Cu F.C n « D n .C n 

CREATE VIEW DT{ AS 
SELECT Hi,Fi 
FROM Di 

Clearly, the views FT and DTi, . , DT n form a typ- 
ical star schema. User queries formulated on the user 
schema are rewritten by the query processor over the ta- 
bles F and D\ } . , Dn by replacing the views by their 
view definitions. In the following we consider queries 
that are rewritten over the CSB star schema. 

5. OPTIMIZING OLAP QUERIES 

We show in this section how to heuristically optimize 
OLAP queries by exploiting the clustering scheme and 
the access methods of the multidimensional database 
architecture. 



5.1 The class of queries considered 

We consider OLAP queries of the form shown below. 
Typical cases of OLAP operations can be expressed by 
this SQL query. 

SELECT X, A 

FROM F, Di, . , . . D n 

WHERE c J AND c H AND c F 

GROUP BY G 

HAVING c A 

ORDER BY O 

X is a set of hierarchy and/ or feature attributes (called 
projected grouping attributes). A is a set of aggregated 
measures (aggregate functions on measure attributes). 
Using the catergorization of aggregate functions intro- 
duced in [7], we focus on distributive SQL aggregate 
functions: min, max, sum, count. G is a set of hi- 
erarchy- and/or feature attributes (called grouping at- 
tributes). c J is a conjunction of eqtli-joilling conditions 
on the compound surrogates. is a conjunction of 

comparisons involving exclusively hierarchy attributes. 
C F is a conjunction of comparisons involving exclusively 
feature attributes; is a conjunction of comparisons 
involving exclusively aggregated measures from A. 0 
is a list of attributes from X U A. * The joining condi- 
tions are of the form F.Ci = Di.C%. The comparisons 
involving an attribute A are of the form A 8 C where 9 
is one of the comparison operators <, <, =, >, >, and 
C is a constant value. 

We assume that at least one hierarchy attribute from 
each dimension is involved in a comparison in the WHERE 
clause of the query. We also assume that if comparisons 
of the form H{ 9 c, where 9 € {<, <, >, >} appear m 
the WHERE clause of a query, then the parent function 
of the dimension hierarchy Di is TflonotOflZ. A parent 
function is monotone if for every v, v' € dom(Hi\ j = 
1,. . , fc, if V < V* then parent(w) < parent(v'). This 
assumption guarantees that a range restriction on a hi- 
erarchy attribute of a dimension table can be expressed 
as a single range restriction on the compound surrogate 
of the table. In this . section we consider queries whose 
hierarchy and feature attribute restrictions can be ex- 
pressed as a multidimensional range restriction on the 
compound surrogate attributes of all the dimensions ta- 
bles (i.e., a single query box). 

5.2 Physical operators 

In order to construct an OLAP query evaluation plan, 
we use a number of physical operators that are presented 
below. Some of them are the traditional relational op- 
erators and the others are specific to the organization 
of the multidimensional database. 

By abuse of notation, we view a compound surro- 
gate attribute Ci as a composite attribute consisting of 
surrogate attributes Sf* , , S} . If t is a tuple in the 
dimension table Di, J £ [1, is the binary rep- 

resentation of the surrogate of t[H{] (t[Sj] = S{t[Hj])). 
t[S 3 i] is part of t[d] and its length in bits is equal to the 
spread of attribute H\. 

We denote by the projection with duplicate reten- 
tion operator on the set of attributes X, and by IIx the 
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set-theoretic projection operator (SELECT DISTINCT) on 
X. o c denotes the selection operator with selection con- 
dition c 5[Y] denotes the sorting operator on the list 
of attributes Y. WY denotes the natural join operator 
on the list of attributes Y. The natural join operator 
is applied on tables that are sorted on the list of at- 
tributes Y and is implemented as merge join. 7Tx.a 
denotes the generalized projection operator [8] (group- 
ing/aggregation operator), where X is a set of grouping 
attributes and A is a set of aggregate functions on mea- 
sure attributes. 

The TetriS operator, denoted T[(Ji, Ui],., [i n , U n ]\ C;], 
i € [1, n], can be applied to the fact table and represents 
the Tetris algorithm. [Ij, Uj], j € [1, is a range of val- 
ues for the compound surrogate attribute Cj } and Ci is 
the compound surrogate attribute on which the result- 
ing table is ordered (refer to Section 3). The schema of 
the resulting table is that of the fact table. 

The Range operator, denoted R[c^ ; B] can be applied 
to the compound index Ii of a dimension table Di. cf 
is a condition of the type described in Subsection 5.1, 
and involves hierarchy attributes of dimension table Di. 
B is a boolean variable that takes values *V and '0 7 . 
R\cf \ 0] on Ii returns a range of values Ui] on Ci 
such that restricting Ci of Di in this range is equivalent 
to applying the restriction cf to Di. R[c^ ] l] on Ii 
returns, besides a table T. The schema of T is 

if* 1 \ . . . / Hi , Ci an d its content is the set of tuples t over 
H}\. d in Ii such that i; < t[Q] < Ui. 

5.3 Construction of the evaluation plan 

We show now how to optimize OLAP queries of the 
type presented in Subsection 5.1. Our approach is heuris- 
tic and is not based on a specific cost model. For ease 
of presentation we use an example query that is general 
enough to encompass different optimization cases, and 
we show how our technique can be applied to produce 
an evaluation plan. 

Example 5.1. We consider the following query de- 
fined over a four dimensional schema. 

SELECT Ff t if!, #3, sum(M) 

FROM F } D U D 2) D 3) D 4 

WHERE F.Ci = Di .Ci AND F.Ci » D2.C2 AND 

F.C3 = D3.C3 AND F -C4 = D < C * AND 
Ci AND c^AND cf. AND AND cf 

GROUP BY F?, flf, Hi, F$ 

HAVING - A 



ORDER BY H^ Hi n 

A query evaluation plan for this query that takes ad- 
vantage of our multidimensional database architecture 
is shown in Figure 1. The expensive star-join operations 
required in a traditional star schema are here essentially 
implemented by a multidimensional range restriction on 
the fact table [14]. The presence of the compound sur- 
rogate attributes in the fact table allows for an early 
grouping and aggregation operation. This operation is 
facilitated by the fact that the selected fact table tuples 
are retrieved sorted on a compound surrogate attribute. 
The evaluation plan comprises two kinds of nodes: op- 
eration nodes representing operations, and data nodes 



representing input data, and intermediate and final re- 
sults. An operation node is depicted by a small circle 
and is labeled by the operation(s) it represents. Some 
operations may be pipelined in which case the corre- 
sponding temporary results are not actually stored on 
the disk. The fact table and the dimension tables are de- 
picted by bigger circles, the compound indexes by rect- 
angles, and the ranges of computed compound surrogate 
attribute values by triangles. Before discussing, in the 
following, the different steps of the evaluation plan, we 
provide some definitions. 

Definition 5.1. A (hierarchy or feature) attribute is 
called restricted attribute if it is involved in a selection 
condition (c" or c") in the query. 

An attribute is called imported attribute if it is a pro- 
jected grouping hierarchy attribute or a grouping fea- 
ture attribute in the query. A dimension containing 
imported attributes is called joining dimension. O 

An imported attribute needs to be added to selected fact 
table tuples. A grouping hierarchy attribute that is not 
projected in the query need not be added to the fact 
table tuples since the compound surrogate attributes 
can be used for performing the grouping operation. For 
each joining dimension, a join operation is needed in 
order to add the imported attributes of the dimension 
to selected fact table tuples. 

Definition 5.2. A joining dimension D; is called can- 
didate East joining dimension if one of the following two 
conditions hold: 

(a) The first attribute in the list of sorting attributes 
in the query (the list 0 of attributes in the ORDER 
BY clause of the query) is a hierarchy attribute of 
Di. 

(b) The first attribute in the list of sorting attributes 
in the query is not a hierarchy attribute (or there is 
no such a list), there is a grouping feature attribute 
in the query, and there is a grouping hierarchy at- 
tribute of Di in the query. □ 

Selecting the dimension involved in the last join oper- 
ation to be a candidate last joining dimension simplify 
the final grouping and sorting operations. 

Computing ranges of compound surrogate values 
The first step in the construction of the query evaluation 
plan is the application of the Range operator Jftfc^j; B] 
to the compound index Ii of each dimension. R[Ci ; B] 
computes the lower and the upper bound of the range 
of values [Ji>w;] of the compound surrogate using the 
comparisons in c{* and the compound index /j. The 
range [/i,iti] is always provided as input to the Tetris 
operator. It can also be used for a selection operation 
on the dimension table D%. If only hierarchy (and no 
feature) attributes from a dimension arc involved in a 
query, then their values can be obtained directly from 
the compound index Ii, without accessing the dimension 
table Di . In this case the parameter B of R\c^ ; B] is set 
to '1'. In general the application of the range operator 
follows the rules below: 
(a) If there is no imported attribute or restricted fea- 
ture attribute of the dimension in the query ,then 
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B is set to '(T* and the computed range is used only 
by the Tetris operator. 

(b) If the imported attributes of the dimension are only 
hierarchy attributes and there is no restricted fea- 
ture attribute of the dimension in the query, then 
B is set to '1'. The computed range is provided 
to the Tetris algorithm only, while the set of tu- 
ples retrieved from /,■ are used in a subsequent join 
operation. 

(c) If the imported attributes of the dimension include 
a feature attribute or there is a restricted feature 
attribute of the dimension in the query then B is 
set to 'O'. The computed range is provided both to 
the Tetris algorithm and to a selection operation 
on the corresponding dimension table. 

Restricting the dimension tables If the imported 
attributes of the dimension include a feature attribute 
or there is a restricted feature attribute of the dimen- 
sion in the query, the dimension table needs to be ac- 
cessed. The primary index on the compound surrogate 
attribute d of the dimension is used to retrieve the tu- 
ples of the dimension table that fall within the range 
of values computed by the range operator. Those of 
these tuples that satisfy the condition cf are retained 
projected over an appropriate set of attributes. If this 
computation returns an empty set of tuples, the whole 
processing of the query is ended since the answer is an 
empty set. Otherwise, the computed tuples are subse- 
quently joined with tuples derived from the fact table. 
The set S of attributes of D{ to be projected are deter- 
mined as foUows: 

(a) S includes aU the imported attributes of Di, and 

(b) Prom the surrogate attributes S*' , , S\ of the 
compound surrogate Ci, S includes the surrogate 
attributes S± \ , S\ % where j is the minimal level 
of the imported attributes of D;. 

If the minimal level j is high enough in the dimension hi- 
erarchy this duplicate elimination projection is expected 
to significantly reduce the number of selected tuples. 

It is important to note that the tuples from the di- 
mension table are retrieved sorted on the compound sur- 
rogate attribute Q. Since d is the concatenation of the 
surrogate attributes . , these tuples are also 

sorted with respect to the list of surrogate attributes 
S% ' > • , Sj. As a consequence, the elimination of du- 
plicates required for the projection operation can be per- 
formed without extra cost. Furthermore, the sort order 
of the output tuples can be exploited in a subsequent 
merge join operation with tuples derived from the fact 
table. 

Multidimensional range selection and sorting The 

Tetris operator takes the ranges of values on the com- 
pound surrogate attribute computed in the first step, 
and retrieves from the fact table the qualifying tuples 
sorted on a compound surrogate attribute Ci. In gen- 
eral, the choice of the sorting attribute Ci does not affect 
the performance of the Tetris operator. We assume that 
the cache memory requirements of the Tetris algorithm 
arc satisfied by the available main memory [15]. The 
sorting attribute Ci for the Tetris algorithm is chosen 
from a dimension according to the foUowing rules: 



(a) d is chosen from a joining' dimension that is not a 
candidate last joining dimension. 

(b) If Ci cannot be chosen by the rule (a), it is chosen 
from a joining dimension. 

(c) If Ci cannot be chosen by rules (a) or (b), it is cho- 
sen from a dimension that has a hierarchy attribute 
involved as a grouping attribute in the query. 

(d) If d cannot be chosen by the previous rules, it is 
chosen arbitrarily. 

Processing of the selected fact table tuples The 

presence of the compound surrogate attributes C\ , . , C n 
in the fact table, allows for an early grouping and aggre- 
gation of the fact table tuples resulting by the Tetris al- 
gorithm [5]. This operation can be performed efficiently 
due to the fact that the tuples resulting by the Tetris 
operation are already sorted (and thus grouped) with 
respect to one compound surrogate attribute Ci, * € 
[1, n]. The set of attributes on which the grouping is per- 
formed comprises the surrogate attributes , S{ 
from each dimension Di that contains a grouping at- 
tribute in the query, j. is the minimal level of the (hi- 
erarchy and feature) grouping attributes ( of Di in the 
query. Usually, in OLAP applications, a fact table con- 
tains a huge number of tuples which are grouped to pro- 
duce a small number of aggregated results. Therefore, 
this early grouping operation is expected to drasticaUy 
reduce the number of fact table tuples at an early stage 
of their processing. 

If there is no grouping feature attribute in the query 
then the final grouping and aggregation operation that 
will be presented in the next step is not needed. The 
reason is that the compound surrogate attributes have 
already be used instead, for grouping on hierarchy at- 
tributes. In this case the selection operation on the 
aggregated measures {O c a ) can follow immediately af- 
ter the early grouping operation to further reduce the 
number of aggregated fact table tuples that are left to 
be processed. 

If there is at least one joining dimension, OHG of them 
has provided its compound surrogate attribute as a sort- 
ing attribute to the Tetris operator. In this case, the 
aggregated tuples are equi-joined on the common at- 
tributes with the selected tuples of this dimension. Since 
both sets of tuples are sorted on their common attributes, 
a merge join algorithm can be efficiently applied. Each 
aggregated tuple joins with exactly one tuple from the 
dimension table. This operation does not alter the num- 
ber of tuples resulting by the grouping/ aggregation op- 
eration. It does add to these tuples the imported at- 
tributes of the joining dimension. Grouping hierarchy 
attributes of the dimension table that are not projected 
grouping attributes in the query are not needed since 
the fact, table tuples have already been grouped using 
the surrogate attributes. 

A similar operation is performed on the resulting tu- 
ples for each other joining dimension. This Operation 
has to be preceded by a sort operation with respect to 
the joining attributes, and by a project operation that 
eliminates the attributes not needed in subsequent op- 
erations. The sort operation has to be performed only 
on the tuples resulting from the fact table since the di- 
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mension tuples are alrea dy sorted with respect to the 
joining attributes. The order of the join operations does 
not significantly matter. The only rule that has to be 
respected is to choose a candidate last joining dimension 
for the final join operation. This way the sort order of 
the resulting tuples can be exploited in the subsequent 
operations. 

Final grouping/aggregation and sorting In the last 
step of the evaluation plan the tuples resulting by the 
last join operation are grouped and aggregated. The 
aggregated tuples that satisfy the condition c A on a g- 
gregated measures are retained projected over the at- 
tributes required in the output. As mentioned previ- 
ously, these operations can be avoided when there is no 
grouping feature attribute in the query. If the output 
tuples are required sorted, a final sorting operation is 
also performed. These operations exploit the sort order 
of the tuples resulting from the previous step. 

6. CONCLUSION 

OLAP applications view data structured in multiple 
hierarchically organized dimensions. Complex group- 
ing/aggregation queries for OLAP applications process 
enormous quantities of data and require fast response 
time. Recent research suggests that multidimensional 
access methods outperform one-dimensional indexing 
techniques. 

We have presented the CSB star, an architecture for 
a multidimensional database that is based on the star 
schema. This architecture uses one-dimensional hier- 
archical clustering and encoding techniques to organize 
the dimension tables and multidimensional access meth- 
ods to organize the fact table. Users can express their 
queries over a traditional star schema, which are then 
rewritten by the query processor over a CSB star schema. 
We have shown how the features of this schema allow 
the heuristic optimization of a class of typical OLAP 
queries: expensive star-join operations are essentially 
reduced to multidimensional and one-dimensional range 
restrictions, supplementary joins are implemented as 
merge join operation on sorted tables, and grouping op- 
erations are performed on partially sorted data. We 
have detected special cases where supplementary joins 
are avoided, and a grouping operation can be pushed 
past all join operations. 

An interesting extension of the present work concerns 
considering a larger class of queries. In particular relax- 
ing a number of the restrictions adopted here can result 
in queries that determine multiple query boxes on the 
compound surrogate attributes. Our results apply to 
this case too by considering the different query boxes 
separately. However, a cost based optimization tech- 
nique that considers different groupings of these query 
boxes is expected to provide further improvements. 
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Figure 1: A query evaluation plan 
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